Introducing Schema Validation in MongoDB

Comments 0

Share to social media

This article is part of Robert Sheldon's continuing series on Mongo DB. To see all of the items in the series, click here.

Similar to other NoSQL database systems, MongoDB is known for its flexible and variable schema models, unlike relational database systems in which well-defined schemas are essential to ensuring data integrity. In MongoDB, you can add documents to a collection that contain different fields or that include the same fields but with different data types or value ranges. You can even add documents that are completely unrelated. As long as you use proper Binary JSON (BSON) formatting, just about anything goes.

In some cases, however, this flexibility can get to be a little too much, and you might want to impose restrictions on a collection’s documents. In this way, you can better control how data is stored and presented so your applications experience the data in a consistent and reliable manner. For example, you might want to ensure that all documents added to a collection include the name field and that the field always takes a string value.

You can impose such restrictions on a collection’s documents by defining schema validation rules that specify the acceptable fields and their values. MongoDB’s validation capabilities are flexible and simple to implement and can be easily modified when needed. You can also create rules at a very granular level, even if it’s only a single field value. MongoDB applies the rules to new documents as they’re inserted into the collection and to existing documents when they’re updated. If a document violates those rules, MongoDB rejects the operation.

In this article, I demonstrate how to define validation rules on a collection. The article provides multiple examples of schema definitions that contain different types of validation rules. This is the first of two articles on schema validation. By the end of this article, you should have a good sense of how validation rules work and how to add them to your collections.

Note: For the examples in this article, I used the same MongoDB Atlas and MongoDB Compass environments I used for the previous articles in this series. Refer to the first article for more specifics about setting up these environments.

Introducing the JSON Schema object

You can use a MongoDB Shell command to define schema validation on a collection. You can also use the GUI features in MongoDB Compass, but you still need to understand how to build the validation rules themselves, and MongoDB Shell is a good place to start.

When adding schema validation to a collection, you must use the validator method to create a JSON Schema object, which defines the validation rules. As part of this process, you must also use the $jsonSchema operator to build the actual rules.

You can define the JSON Schema object when you first add your collection to the database or after the collection already exists. In either case, the format you use for invoking the validator method is the same. The following syntax shows this format, which starts with calling the validator method:

As the syntax shows, you pass the $jsonSchema operator in as an argument to the validator method. The $jsonSchema operator, which is enclosed in curly brackets, defines the JSON Schema object, which is also enclosed in curly brackets.

The schema object itself is based on draft 4 of the JSON Schema standard. MongoDB omits several elements from the standard, while also extending it to support MongoDB’s BSON data types. A full explanation of the JSON Schema standard and its implementation in MongoDB is beyond the scope of this article, but you can find more information about how MongoDB implements the JSON Schema in the MongoDB topic $jsonSchema.

A good way to learn how to define validation rules is to see them in action. To this end, I’ve created a series of examples that demonstrate the core components that go into a collection’s schema validation object. The examples use the hr database and candidates collection, but you can use any database or collection, preferably one that’s empty and not deployed to a production environment.

If you want to try out these examples yourself, I recommend that you stick with the hr database and candidates collection to keep things simple. At this point in the series, you should have no trouble creating a database and collection. Refer to previous articles in this series if necessary.

When I created the examples, I used the version of MongoDB Shell that’s embedded in the MongoDB Compass GUI. I like this version of Shell because I can easily verify changes to my documents by viewing them in the Compass GUI. That said, if you have MongoDB Shell installed on your system, you can instead use your system’s command-line interface (CLI) to try out these examples. The results are the same in either case.

For this article, I created the examples based on an existing collection (candidates), rather than defining them when creating the collection. This approach makes it easier to run through the examples without needing to drop the collection before re-creating it. It also makes it easier to reuse the statement as you refine the rules.

Adding validation rules to a MongoDB collection

To add validation rules to the candidates collection, we’ll start by using the runCommand database method to call the collMod database command. The command let’s us add options to a collection, which in this case, are the validation rules. We’ll use the command to call the validator method and, subsequently, the $jsonSchema operator, which defines the JSON Schema object. The following example demonstrates how all this works:

The entire statement is passed into MongoDB Shell as a single command. As noted earlier, the command adds the validation rules to an existing collection (candidates). If you want to add the rules when creating the collection, you must include the validator object as an argument to the createCollection method. The MongoDB topic Specify JSON Schema Validation shows an example of how this is done.

Returning to the example above, the first three lines of the command are fairly standard when defining validation rules:

  1. Invoke the runCommand method on the database object associated with the hr database. The method runs the collMod database command. The command’s first argument is the candidates collection.
  2. Specify the validator method as the second argument to the collMod command.
  3. Specify the $jsonSchema operator as an argument to the validator method.

The remaining code, enclosed in curly brackets, defines the JSON Schema object that is returned by the $jsonSchema operator. A JSON Schema object is essentially a JSON document. Each line is a schema element that contains a keyword, followed by a value, much like a JSON document in which each field is followed by the field value. The elements are organized hierarchically. In this case, the following four elements at the top of the hierarchy:

  • bsonType. Indicates the data type of that particular element. When bsonType is included as a top-level element, as it is here, the data type is object and refers to the JSON Schema object as a whole. This element is often omitted from the hierarchy’s top level.
  • title. Provides a name for the set of validation rules. This element is often omitted from the schema definition.
  • required. Indicates which fields are required in the collection’s documents. The element’s value is an array of string values that list the field names. Any fields within the array must be included in the document. The element is omitted from the schema definition if no fields are required.
  • properties. Defines specific properties associated with the listed fields, which are included as subelements within the properties element, much like an embedded document. Each document field must adhere to the schema defined for that subelement. The properties element is omitted from the schema definition if no field properties need to be defined.

In this case, the top-level properties element includes the three field subelements: name, dob, and position. The subelements for the name and dob fields specify the data type and provide a description for each field. When a data type is specified, the field’s value must conform to that type. The description is used when returning an error message relevant to that subelement.

The third subelement applies to the position field, which is an embedded document. The position subelement includes its own subelements: bsonType, required, and properties.

In this case, too, the properties subelement is broken down further into its own field subelements: title, dept, skills, and yrs_exp. All four of these embedded subelements define the data type and provide a description. There are also a couple new element types.

  • The dept subelement includes the enum element, which defines the values that the position.dept field can include. The values are defined as an array of stings. No other values can be inserted into this field.
  • The yrs_exp subelement includes the minimum element, with its value set to 3. As a result, the value inserted into the position.yrs_exp field must be 3 or greater.

Those are all the elements that make up this particular JSON Schema object. You can include fewer details in your schema definition, or you can include more. You can also include element types not shown here. The MongoDB topic $jsonSchema includes a list of available element types—or keywords—that MongoDB supports for schema validation.

Verifying a collection’s schema validation rules

After you define validation rules on the candidates collection, you’ll likely want to verify that the rules are working as expected. A good way to do this is to try to insert a document into the collection. For example, the following insertOne command tries to add a document that includes all the fields referencd in the validation rules:

On the surface, it might appear that you should be able to add this document with no problem. However, the validation rules specify that the dob field can take only a date value, while the command tries to add it as a string value. As a result, MongoDB returns the following error, which indicates that there is a type mismatch:

Notice that the error message includes the description element that was defined on the dob field. The message also indicates that the type did not match what was expected. To correct this issue, you need to pass in the dob value as a date type, as shown in the next example:

You should now be able to insert the document with no problem. MongoDB will then return a confirmation message similar to the following, although with a different insertedId value:

You can also use an updateOne command to confirm that your validation rules are working as expected. For example, the following updateOne command tries to update the Drew document by setting the position.dept value to Dev:

As you’ll recall from the collection’s validation rules, the position.dept value must be one of those specified in the enum array. Dev is not in that array, so the document cannot be updated in this way. If you try to run this command, you’ll receive an error message indicating that Dev was not found in enum.

Another way you can test your validation rules is to try to update a document by changing the position.yrs_exp value to 2, as in the following example:

In this case, the validation rules state that the position.yrs_exp value must be at least 3, so MongoDB will again prevent you from updating the document. Instead, you’ll receive an error message indicating that your comparison failed.

Controlling permitted fields in a collection’s document

By default, schema validation is concerned only with the fields that are specified within the validation rules. In other words, there is nothing to prevent you from adding other fields to your documents. For example, you might add a document to the candidates collection that includes a field describing the candidate’s personal interests or hobbies.

In some cases, however, you might want to ensure that the only fields included in a document are those defined within the properties element. You can do this by adding the additionalProperties element to your schema definition and setting its value to false, as in the following example:

This command is the same as the previous validation example, except that it now includes the additionalProperties element. If you run this command in MongoDB Shell, it will automatically update your validation rules on the candidates collection. You don’t need to take any other steps to update the rules.

Although adding the additionalProperties element is fairly straightforward, you must be careful when doing so. For example, you might expect to be able to run the following insertOne command, which tries to add another document to the candidates collection:

The document does not include any fields that are not defined in the properties element, nor does it appear to violate any rules defined on the individual fields. However, if you try to run this command, you’ll receive an error stating that your document contains a property (field) not defined in the properties element. The error, as it turns out, is related to the _id field.

Each document in a MongoDB collection must include the _id field. If you don’t include the field in your document definition, MongoDB will automatically add it. However, the validation rules, as they’re currently defined, do not specify this field, so any document you try to add the collection will fail validation.

To address this issue, you must specify the _id field within the properties element, along with the other fields, as shown in the following example:

As you can see, the top-level properties element now includes a subelement for the _id field, with its data type defined as objectId. I also included _id field in the required element just for completeness.

After you update the schema, you can verify your changes by rerunning the previous insertOne command:

You should now be able to insert the document with no problem. However, this only shows that that you can add a document that is in the expected format. Another test you can perform is to try to add a document that contains a field not defined in the properties element, as in the following example:

The command attempts to add a document that includes the region field. If you try to run this command, you will receive an error message stating that region is an additional property. To address this issue, you can simply remove the offending field, as in the following insertOne statement:

When you run this version of the command, you should be able to insert the document with no problem.

Working with null values when validating schema

In some cases, you might want to be able to insert a null for a field value when adding or updating a document. For example, an application might automatically default to null if a value is not known, rather than excluding the field from the document. If null values are going to be used, you must take them into account when configuring validation rules.

For example, suppose the position.title field might take a null value in some cases, as in the following updateOne command:

If you try to run this statement, MongoDB will generate an error message stating that the type does not match. This is because the validation rules currently state that the position.title field permits string values only. However, you can easily fix this issue by updating the position.title subelement in the schema definition, as in the following example:

Notice that I’ve updated the bsonType value for the position.title subelement. The value is now an array that includes both string and null as acceptable types. After you update the schema definition, you can then try to rerun the updateOne command. MongoDB should now update the document and return the following message:

That’s all there is to handling null values when defining validation rules. You can do this for any field in which null might be an acceptable value. That said, you need to do this only for those fields included in the properties element.

Adding query operators to your schema validation rules

In some cases, you might want to include additional logic to your schema definition to better control field values when inserting or updating documents. For this, you can use query operators to build expressions that apply different types of logic to your documents.

For example, suppose you want to ensure that all job candidates are at least 21 years old. You can update your schema definition to include this logic, along with the original JSON Schema object, as shown in the following example:

The first thing to notice is that the first validator argument now begins with the $and logical operator, which is followed by two conditions: an expression that calculates the age based on the dob field and the JSON Schema object definition, which is the same one from the previous validation example. Both of these conditions must be met to be able to insert or update a document.

The JSON Schema object should need no further explanation because nothing has changed, so let’s take a closer look at the expression, which begins with the $expr evaluation operator. The operator lets us define an aggregation expression within our query. The aggregation expression is everything enclosed in the curly brackets that follow the $expr keyword.

That expression uses the $lte logical operator to indicate that the dob value (represented as $dob) must be less than or equal to the value returned by the $dateSubtract operator. The $dateSubtract operator subtracts 21 years from the current date to arrive at a date value that is then compared to the dob value. If the dob value occurs before the returned date value, the document can be added.

After you update the schema definition, you can test out your changes by running the following statement.

The statement uses 2004-7-2 as the dob value, which should cause the statement to fail. If by the time you read this article, the date is no longer less than 21 years from the current date, you’ll need to adjust the value. The idea is to get the command to generate an error as a result of trying to add a dob value that is below the required age. When the command fails, you should get an error message stating that the expression did not match.

Next, run the following insertOne command, which is the same as the previous one, except that the dob value is now definitely more then 21 years ago:

The command should run with no problem because it does not violate any of the validation rules.

There are, of course, plenty of other ways you can define your expressions so you can include different logic in your validator object. You can, in fact, define an expression without including a JSON Schema object. For more information about using query operators, see the MongoDB topic Specify Validation With Query Operators.

Getting started with MongoDB validation rules

For the examples in this article, I used MongoDB’s default validation settings. In the next article, I plan to continue the discussion on validation rules, at which time, I’ll provide more details on how you can override the default behavior. I’ll also discuss validation rules as they apply to documents that already exist within a collection. In the meantime, I suggest you review the MongoDB topic Schema Validation, which introduces you to validation rules. I think you’ll find that schema validation is a valuable tool for working with MongoDB data, although you’ll likely want to limit its use to more mature applications when the schema is relatively stable.

Load comments

About the author

Robert Sheldon

See Profile

Robert is a freelance technology writer based in the Pacific Northwest. He’s worked as a technical consultant and has written hundreds of articles about technology for both print and online publications, with topics ranging from predictive analytics to 5D storage to the dark web. He’s also contributed to over a dozen books on technology, developed courseware for Microsoft’s training program, and served as a developmental editor on Microsoft certification exams. When not writing about technology, he’s working on a novel or venturing out into the spectacular Northwest woods.